#Project Overview:
This project investigates how U.S. organizations frame AI in press releases concerning disaster response and management. More specifically, I would focus on how AI is framed in organizational press releases related to disaster management, what are key themes and dominant narratives related the discussions of AI in the context of natural disasters and crises in United States and finally what potential risks and ethical concerns regarding AI usage are highlighted in these communications?
#Objective of the Project:
Analyze how U.S. organizations frame AI in press releases during disaster response and management efforts.
RQ1. How U.S. organizations frame AI in press releases in the context of disaster? RQ2. What are most prevalent frames about AI in organizational press releases related disaster. RQ3. How did these frames have changed about AI change over time? RQ4. What type of sentiments are associated with AI in how they have changed overtime?
Data has been collected from the database Nexis Uni, focusing on press releases disseminated through major news wires. The collection period spans from Nov 1, 2019, to Nov 1, 2024, using the keywords “artificial intelligence” OR “ai” OR “generative ai” OR “machine learning” OR “deep learning” AND “disaster” OR “disaster management” OR “disaster communication” OR “disaster response” OR “disaster preparedness”. A total 7470 press released were collected published in last five years wiht following number each year.
No of Years Number of Press Releases 1 2024 1722 2 2023 1750 3 2022 1433 4 2021 1332 5 2019-2020 1333
.The final dataset comprises 7470 press releases sourced mainly from the following news wires:
These press releases explicitly mention “Artificial Intelligence” or “AI” in conjunction with disasters.
Total Press Releases: 7470 Average Releases Per Month: 110 Top Mentioned Keywords: “Artificial Intelligence,” “AI,” “Disaster”
Dominant Headlines of the Press Releases: Ethical Concerns Public Trust AI Benefits AI Limitations
Regional Focus: United States
Organizations Highlighted: FEMA Department of Homeland Security Major Tech Companies Non-Governmental Organizations (NGOs)
A computational textual analysis approach is employed to analyze the collected press releases. The analysis is conducted using R programming, leveraging various libraries and packages to facilitate data processing and analysis. The key steps include:
Data Cleaning and Preparation:
Expected Outcomes
##Repository Highlights
library(tidyverse)
library(pdftools)
library(tidyverse) # Includes dplyr, ggplot2, purrr, readr, stringr, etc.
library(textdata)
library(tidytext)
library(quanteda)
library(rio)
library(janitor)
library(rio)
library(stringr)
here::here()
#topic modeling
library(tm)
library(topicmodels)
library(lda)
library(ldatuning)
# from tutorial packages
library(DT)
library(knitr)
library(kableExtra)
library(reshape2)
library(ggplot2)
library(wordcloud)
library(pals)
library(SnowballC)
library(flextable)
##Defining the directory containing the PDFs
# Defining the directory containing the PDFs
directory <- "../ai_disaster"
# Getting all PDF file paths in the directory
file_paths <- list.files(path = directory, pattern = "\\.PDF$", full.names = TRUE)
# Extracting the Text from PDFs: Combining the text from all PDFs.
combined_text <- sapply(file_paths, function(path) {
pdf_text(path) %>% paste(collapse = "\n")
}) %>% paste(collapse = "\n")
# Spliting and Saving them as Documents: Spliting the combined text by "End of Document" and saving as individual text files.
documents <- strsplit(combined_text, "End of Document")[[1]]
output_dir <- "../ai_disaster/extracted"
# Ensure the directory exists
if (!dir.exists(output_dir)) {
dir.create(output_dir, recursive = TRUE)
}
# Write each document to a text file
for (i in seq_along(documents)) {
output_file <- file.path(output_dir, paste0("FramingAI_extracted", i, ".txt"))
writeLines(documents[[i]], output_file)
}
cat("Files created:", length(documents), "\n")
## Files created: 7471
##Create a data frame with file names and their corresponding content
# List all text files in the output directory
extracted_files <- list.files(output_dir, pattern = "\\.txt$", full.names = TRUE)
# Read the content of each file into a list
documents_text <- lapply(extracted_files, function(file) {
readLines(file) %>% paste(collapse = "\n") # Combine lines into a single text block
})
# Create a data frame with file names and their corresponding content
documents_df <- tibble(
document_id = basename(extracted_files), # Extract file names (without path)
document_text = documents_text # Content of each document
)
# View the first few rows of the data frame
head(documents_df)
## # A tibble: 6 × 2
## document_id document_text
## <chr> <list>
## 1 FramingAI_extracted1.txt <chr [1]>
## 2 FramingAI_extracted10.txt <chr [1]>
## 3 FramingAI_extracted100.txt <chr [1]>
## 4 FramingAI_extracted1000.txt <chr [1]>
## 5 FramingAI_extracted1001.txt <chr [1]>
## 6 FramingAI_extracted1002.txt <chr [1]>
##Extract Metadata and building final data Frame
##building Final Index
# Step 1: Create a File Name List from File Paths
file_names <- basename(file_paths) # Extract just the file names from the full paths
# Step 2: Create a Mapping DataFrame
file_mapping <- tibble(
index = seq_along(file_names), # Use sequential indices for each file
filename = file_names # Map file names to indices
)
# Step 3: Join File Mapping with Final Data
final_index <- final_data |>
inner_join(file_mapping, by = "index") |> # Join based on index
mutate(
filepath = paste0("../ai_disaster/extracted/", filename) # Construct full file paths
)
# Step 4: Display the Head of the DataFrame with File Names and Content
print(head(final_index))
## # A tibble: 6 × 6
## index title date publication filename filepath
## <int> <chr> <date> <chr> <chr> <chr>
## 1 1 http://www.businesswire.com 2019-11-02 <NA> Framing… ../ai_d…
## 2 2 Page 1 of 2 2019-11-02 <NA> Framing… ../ai_d…
## 3 3 Webroot Announces Business End… 2019-11-02 <NA> Framing… ../ai_d…
## 4 4 Empower MSPs to Do Business Th… 2019-11-02 <NA> Framing… ../ai_d…
## 5 5 Webroot Announces Business End… 2019-11-02 <NA> Framing… ../ai_d…
## 6 6 SyncroMSP; New Integration Hel… 2019-11-02 <NA> Framing… ../ai_d…
# Rename 'document_id' to 'filename' for consistency
documents_df <- documents_df %>%
rename(filename = document_id)
# Merge final_index with documents_df on 'filename'
merged_data <- final_index %>%
left_join(documents_df, by = "filename")
# Check for any missing text data
missing_text <- merged_data %>%
filter(is.na(document_text))
if (nrow(missing_text) > 0) {
warning("There are documents with missing text data:")
print(missing_text)
} else {
cat("All documents have corresponding text data.\n")
}
## All documents have corresponding text data.
# Save the merged data to a CSV file for future use
write_csv(merged_data, "~/Desktop/Code/FramingAI/ai_disaster/final_merged_data.csv")
# Preview the merged data
head(merged_data)
## # A tibble: 6 × 7
## index title date publication filename filepath document_text
## <int> <chr> <date> <chr> <chr> <chr> <list>
## 1 1 http://www.busin… 2019-11-02 <NA> Framing… ../ai_d… <chr [1]>
## 2 2 Page 1 of 2 2019-11-02 <NA> Framing… ../ai_d… <chr [1]>
## 3 3 Webroot Announce… 2019-11-02 <NA> Framing… ../ai_d… <chr [1]>
## 4 4 Empower MSPs to … 2019-11-02 <NA> Framing… ../ai_d… <chr [1]>
## 5 5 Webroot Announce… 2019-11-02 <NA> Framing… ../ai_d… <chr [1]>
## 6 6 SyncroMSP; New I… 2019-11-02 <NA> Framing… ../ai_d… <chr [1]>
##Cleaning the Titles
##clenaing the all documnet text data data
library(tidytext)
library(dplyr)
library(stringr)
# Tokenize the document text into sentences
final_data_cleaned2 <- final_data_cleaned %>%
unnest_tokens(sentence, document_text, token = "sentences")
# Define patterns to filter out irrelevant content
irrelevant_patterns <- c(
"^\\s*$",
"^\\d+$",
"^Page \\d+",
"^\\d{1,2} of \\d{1,2}$",
"^©.*$",
"^[A-Za-z]{1}$",
"^[^a-zA-Z]+$",
"Disclaimer",
"Legal Notice",
"Table of Contents",
"For Immediate Release",
"Addendum",
"Appendix",
"Lorem Ipsum",
"^[-=_]{3,}$",
"^[A-Za-z]+\\s?\\d{1,2},\\s?\\d{4}$",
"^\\d{1,2}:\\d{2}\\s?(AM|PM)?$",
"^End of Document$",
"^Table \\d+$",
"^Chart \\d+$",
"^Figure \\d+$",
"^[^\\s]{20,}$"
)
# Process and filter the data
final_data_cleaned2 <- final_data_cleaned2 %>%
filter(!str_detect(sentence, paste(irrelevant_patterns, collapse = "|"))) %>% # Remove irrelevant patterns
filter(str_count(sentence, "\\w+") >= 3) %>% # At least 3 words
filter(str_count(sentence, "[^a-zA-Z0-9]") / nchar(sentence) < 0.5) %>% # Fewer than 50% non-alphanumeric characters
rowwise() %>% # Enable row-wise operations
mutate(
unique_chars = length(unique(str_split(sentence, "")[[1]]))
) %>% # Count unique characters per sentence
filter(unique_chars > 5) %>% # More than 5 unique characters
ungroup() %>% # Remove row-wise grouping
select(-unique_chars) %>% # Drop temporary column
distinct() # Remove duplicate rows
# View first rows of cleaned data
head(final_data_cleaned2)
## # A tibble: 6 × 7
## doc_index title date publication filename filepath sentence
## <int> <chr> <date> <chr> <chr> <chr> <chr>
## 1 1 Untitled Document… 2019-11-02 <NA> Framing… ../ai_d… "user n…
## 2 1 Untitled Document… 2019-11-02 <NA> Framing… ../ai_d… "imperi…
## 3 1 Untitled Document… 2019-11-02 <NA> Framing… ../ai_d… "rob's …
## 4 1 Untitled Document… 2019-11-02 <NA> Framing… ../ai_d… "americ…
## 5 1 Untitled Document… 2019-11-02 <NA> Framing… ../ai_d… "talkde…
## 6 1 Untitled Document… 2019-11-02 <NA> Framing… ../ai_d… "portma…
# Load required libraries
library(dplyr)
library(stringr)
library(tidytext)
bigrams <- final_data_cleaned2 %>%
select(sentence) %>%
mutate(
sentence = str_squish(sentence), # Remove extra spaces
sentence = tolower(sentence),
sentence = str_replace_all(sentence, c(
"copyright" = "",
"new york times"="",
"publication"="",
"www.alt"="",
"http"=""))) %>%
unnest_tokens(bigram, sentence, token = "ngrams", n = 2) %>%
separate(bigram, c("word1", "word2"), sep = " ") %>%
filter(!word1 %in% stop_words$word) %>% # Filter out stop words
filter(!word2 %in% stop_words$word) %>%
count(word1, word2, sort = TRUE) %>%
filter(!is.na(word1) & !is.na(word2))
# Define the pattern to remove specific unwanted terms
remove_pattern <- paste(
"title|pages|publication date|publication subject|publication type|issn|language of publication: english|",
"document url|copyright|news|service|initially|vol|issue|filed|ms|virginia|alexandria|last updated|database|startofarticle|af|rights|october|reserved|september|research articles|proquest document id|",
"classification|https|--|people|alt|article|page|based|language|english|length|words|publication|type|morg|york|times|'new york times'|publication info|illustration|date|caption|[0-9.]|new york times|identifier/keyword|twitter\\.|rauchway|keynes's|_ftn|enwikipediaorg|",
"wwwnytimescom|wwwoenbat|wwwpresidencyucsbedu|wwwalt|wwwthemoneyillusioncom|aaa|predated|a_woman_to_reckon_with_the_vision_and_legacy_of_fran|ab_se|",
"jcr:fec|ac|___________________|\\bwww\\b|[_]+",
sep = ""
)
# Process bigrams
bigrams <- final_data_cleaned2 %>%
select(sentence) %>%
mutate(
sentence = str_squish(sentence), # Remove extra spaces
sentence = tolower(sentence), # Convert to lowercase
sentence = str_replace_all(sentence, remove_pattern, ""), # Remove unwanted terms
sentence = str_replace_all(sentence, "- ", ""), # Remove trailing hyphens
sentence = str_replace_all(sentence, "\\b[a-zA-Z]\\b", "") # Remove single characters
) %>%
unnest_tokens(bigram, sentence, token = "ngrams", n = 2) %>%
separate(bigram, c("word1", "word2"), sep = " ") %>%
filter(!word1 %in% stop_words$word) %>% # Filter out stop words
filter(!word2 %in% stop_words$word) %>%
filter(!word1 %in% remove_pattern) %>%
count(word1, word2, sort = TRUE) %>%
filter(!is.na(word1) & !is.na(word2)) # Filter out NAs
bigrams
## # A tibble: 1,184,674 × 3
## word1 word2 n
## <chr> <chr> <int>
## 1 artificial intelligence 8474
## 2 climate change 5702
## 3 national security 4135
## 4 air force 4020
## 5 assigned patent 3761
## 6 mhine learning 3739
## 7 indo pific 3635
## 8 pr wire 3092
## 9 disaster recovery 2968
## 10 homeland security 2879
## # ℹ 1,184,664 more rows
top_20_bigrams <- bigrams |>
top_n(20) |>
mutate(bigram = paste(word1, " ", word2)) |>
select(bigram, n)
## Selecting by n
ggplot(top_20_bigrams, aes(x = reorder(bigram, n), y = n, fill=n)) +
geom_bar(stat = "identity") +
theme(legend.position = "none") +
coord_flip() +
labs(title = "Top Two-Word phrases in FramingAI articles",
caption = "n=7470 Press Releases. Graphic by Taufiq Ahmad. 12-08-2024",
x = "Phrase",
y = "Count of terms")
###AFINN Lexicon Sentiment Analysis
# Create the interactive ggplot
p1 <- ggplot(sentiment_over_time, aes(x = date, y = average_sentiment)) +
geom_line(color = "steelblue", size = 1) +
geom_point(aes(text = paste0(
"Date: ", format(date, "%Y-%m-%d"), "<br>",
"Avg Sentiment: ", round(average_sentiment, 2), "<br>",
"Sentences: ", sentence_count
)),
color = "darkred", size = 2) +
geom_smooth(method = "loess", se = TRUE, color = "darkgreen", fill = "lightgreen", size = 1) +
labs(
title = "Sentiment Analysis Over Time",
x = "Years",
y = "Average Sentiment Score"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 30, hjust = .05),
plot.title = element_text(hjust = 0.5, size = 10, face = "bold")
)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning in geom_point(aes(text = paste0("Date: ", format(date, "%Y-%m-%d"), :
## Ignoring unknown aesthetics: text
# Convert ggplot to plotly object for interactivity
interactive_plot <- ggplotly(p1, tooltip = "text") %>%
layout(
title = list(text = "Sentiment Analysis Over Time", x = 0.5),
xaxis = list(title = "Date"),
yaxis = list(title = "Sentiment Score"),
hovermode = "closest"
)
## `geom_smooth()` using formula = 'y ~ x'
# Display the interactive plot
interactive_plot
# Create the heatmap
heatmap_plot <- ggplot(sentiment_year_month, aes(x = month, y = factor(year), fill = average_sentiment)) +
geom_tile(color = "white") + # White borders between tiles for clarity
scale_fill_gradient(low = "lightpink", high = "darkred", name = "Avg Sentiment") + # Custom red color scale
labs(
title = "Average Sentiment Over Years and Months",
x = "Month",
y = "Year"
) +
theme_minimal() + # Clean and minimal theme
theme(
axis.text.x = element_text(angle = 30, hjust = 0.5, size = 10), # Rotate x-axis labels for readability and adjust size
axis.text.y = element_text(size = 10), # Adjust y-axis text size
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"), # Center and style the title
aspect.ratio = 0.6, # Adjust aspect ratio to make plot wider and tiles smaller
legend.position = "right", # Position legend on the right
legend.title = element_text(size = 8),
legend.text = element_text(size = 6)
)
# Display the heatmap
print(heatmap_plot)
# Perform the join for trust sentiment
trust_sentiment_over_time <- text_tokenized %>%
semi_join(nrc_trust, by = "word") %>%
count(date, sort = TRUE)
# Add one to all trust counts per date for smoothing
trust_sentiment_summary <- trust_sentiment_over_time %>%
group_by(date) %>%
summarise(total_trust = sum(n) + 1) # Add one as requested
# Visualization
ggplot(trust_sentiment_summary, aes(x = date, y = total_trust)) +
geom_line(color = "blue") +
geom_point(color = "darkblue") +
labs(
title = "Trust Sentiments Towards AI Over Time",
x = "Date",
y = "Trust Sentiment Count"
) +
theme_minimal()
nrc_plot <- sentiments_all %>%
ggplot(aes(x = reorder(sentiment, n), y = n, fill = sentiment)) +
geom_bar(stat = "identity", position = "dodge", width = 0.7) +
geom_text(aes(label = n), hjust = 1.2, size = 4, color = "white") +
labs(
title = "Sentiment Analysis of Press Releases on AI and Disaster",
subtitle = "NRC Sentiment Analysis Breakdown",
x = "Sentiment",
y = "Sentiment Score"
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold", size = 12, hjust = 0.5), # Center and bold title
plot.subtitle = element_text(size = 12, hjust = 0.5), # Center subtitle
plot.caption = element_text(hjust = 0, size = 10, face = "italic"),
axis.text.x = element_text(size = 10, color = "grey40"),
axis.text.y = element_text(size = 12, color = "grey40"),
panel.grid.major.x = element_line(color = "grey90", linetype = "dotted"), # Dotted grid lines
panel.grid.major.y = element_blank(), # Remove unnecessary grid lines
legend.position = "none" # Hide legend since colors are self-explanatory
) +
scale_fill_manual(values = c(
"positive" = "#1f77b4", # Blue
"negative" = "#d62728", # Red
"trust" = "#2ca02c", # Green
"fear" = "#9467bd", # Purple
"anticipation" = "#ff7f0e", # Orange
"anger" = "#e377c2" # Pink
)) + # Vibrant colors for each sentiment
coord_flip() # Flip for better readability
# Print the enhanced plot
print(nrc_plot)
library(ggplot2)
nrc_positive <- nrc_sentiments %>%
filter(sentiment == "positive")
FramingAI_positive <- text_tokenized %>%
inner_join(nrc_positive) %>%
count(date, sort = TRUE) %>%
rename(positive_count = n)
## Joining with `by = join_by(word)`
nrc_negative <- nrc_sentiments %>%
filter(sentiment == "negative")
FramingAI_negative <- text_tokenized %>%
inner_join(nrc_negative) %>%
count(date, sort = TRUE) %>%
rename(negative_count = n)
## Joining with `by = join_by(word)`
# Combine Positive and Negative Counts
sentiments_comparison <- FramingAI_positive %>%
full_join(FramingAI_negative, by = "date") %>%
replace_na(list(positive_count = 0, negative_count = 0)) # Replace NA with 0
# Reshape for Visualization
sentiments_long <- sentiments_comparison %>%
pivot_longer(cols = c(positive_count, negative_count), names_to = "sentiment", values_to = "count")
# Visualization
# Comparative Visualization of Positive and Negative Sentiments Over Time
# Improved Comparative Visualization of Positive and Negative Sentiments Over Time
ggplot(sentiments_long, aes(x = date, y = count, color = sentiment, group = sentiment)) +
geom_line(size = 0.7) + # Reduced line thickness for a balanced appearance
geom_point(size = 0.5) + # Moderate-sized points for emphasis
scale_color_manual(
values = c("positive_count" = "#1f77b4", "negative_count" = "#d62728"), # Custom colors
labels = c("Positive Sentiments", "Negative Sentiments")
) +
labs(
title = "Sentiment Trends Over Time",
subtitle = "Comparing Positive and Negative Sentiments in Press Releases",
x = "Date",
y = "Sentiment Count",
color = "Sentiment Type"
) + theme_minimal(base_size = 10) + # Compact text size
theme(
plot.title = element_text(face = "bold", size = 12, hjust = 0.5), # Centered and proportional title
plot.subtitle = element_text(size = 10, hjust = 0.5), # Slightly smaller subtitle
axis.text.x = element_text(angle = 45, hjust = 1, size = 8), # Smaller x-axis text
axis.text.y = element_text(size = 8), # Smaller y-axis text
panel.grid.major = element_line(color = "grey85", linetype = "dotted"), # Subtle grid lines
legend.position = "top",
legend.title = element_text(size = 9), # Adjusted legend title size
legend.text = element_text(size = 8) # Adjusted legend text size
) +
ggplot2::annotate ("text", x = max(sentiments_comparison$date) - 10, y = max(sentiments_comparison$positive_count) - 5,
label = "Positive Sentiments Lead", color = "#1f77b4", size = 2.5, fontface = "italic") +
ggplot2::annotate ("text", x = max(sentiments_comparison$date) - 10, y = max(sentiments_comparison$negative_count) - 5,
label = "Negative Sentiments Spike", color = "#d62728", size = 2.5, fontface = "italic")
## topic modeling
#Loading relevant libraries and packages for topic modelling
topic_data <- final_data_cleaned2 %>%
select(filename, sentence) %>%
as.data.frame() %>%
rename(doc_id = filename, text= sentence)
# load stopwords
english_stopwords <- readLines("https://slcladal.github.io/resources/stopwords_en.txt", encoding = "UTF-8")
# create corpus object
corpus <- Corpus(DataframeSource(topic_data))
# Preprocessing chain
processedCorpus <- tm_map(corpus, content_transformer(tolower))
processedCorpus <- tm_map(processedCorpus, removeWords, english_stopwords)
processedCorpus <- tm_map(processedCorpus, removePunctuation, preserve_intra_word_dashes = TRUE)
processedCorpus <- tm_map(processedCorpus, removeNumbers)
processedCorpus <- tm_map(processedCorpus, stemDocument, language = "en")
processedCorpus <- tm_map(processedCorpus, stripWhitespace)
#DTM: rows correspond to the documents in the corpus. Columns correspond to the terms in the documents. Cells correspond to the weights of the terms.Girder
# compute document term matrix with terms >= minimumFrequency
minimumFrequency <- 5
DTM <- DocumentTermMatrix(processedCorpus, control = list(bounds = list(global = c(minimumFrequency, Inf))))
# have a look at the number of documents and terms in the matrix
dim(DTM)
## [1] 611553 34782
# due to vocabulary pruning, we have empty rows in our DTM
# LDA does not like this. So we remove those docs from the
# DTM and the metadata
sel_idx <- slam::row_sums(DTM) > 0
DTM <- DTM[sel_idx, ]
topic_data <- topic_data[sel_idx, ]
#5 term minimum[1] 1387 3019
#5 term minimum[1] 308597 10339
# number of topics
# K <- 20
K <- 6
# set random number generator seed
set.seed(9161)
#Latent Dirichlet Allocation, LDA
topicModel2 <- LDA(DTM, K, method="Gibbs", control=list(iter = 1000, verbose = 25, alpha = 0.2))
## K = 6; V = 34782; M = 606997
## Sampling 1000 iterations!
## Iteration 25 ...
## Iteration 50 ...
## Iteration 75 ...
## Iteration 100 ...
## Iteration 125 ...
## Iteration 150 ...
## Iteration 175 ...
## Iteration 200 ...
## Iteration 225 ...
## Iteration 250 ...
## Iteration 275 ...
## Iteration 300 ...
## Iteration 325 ...
## Iteration 350 ...
## Iteration 375 ...
## Iteration 400 ...
## Iteration 425 ...
## Iteration 450 ...
## Iteration 475 ...
## Iteration 500 ...
## Iteration 525 ...
## Iteration 550 ...
## Iteration 575 ...
## Iteration 600 ...
## Iteration 625 ...
## Iteration 650 ...
## Iteration 675 ...
## Iteration 700 ...
## Iteration 725 ...
## Iteration 750 ...
## Iteration 775 ...
## Iteration 800 ...
## Iteration 825 ...
## Iteration 850 ...
## Iteration 875 ...
## Iteration 900 ...
## Iteration 925 ...
## Iteration 950 ...
## Iteration 975 ...
## Iteration 1000 ...
## Gibbs sampling completed!
tmResult <- posterior(topicModel2)
theta <- tmResult$topics
beta <- tmResult$terms
topicNames <- apply(terms(topicModel2, 10), 2, paste, collapse = " ") # reset topicnames
# Step 1: Check dimensions
n_theta <- nrow(theta)
n_topicdata<- length(topic_data)
cat("Number of rows in theta: ", n_theta, "\n")
## Number of rows in theta: 606997
cat("Number of documents in textdata: ", n_topicdata, "\n")
## Number of documents in textdata: 2
# Check if textdata contains all the documents in theta
common_ids <- intersect(rownames(theta), topic_data$doc_id) # Assuming textdata has a 'doc_id' column
# Filter textdata to include only the documents present in theta
topicdata_filtered <- topic_data[topic_data$doc_id %in% common_ids, ]
# Check dimensions after filtering
n_topicdata_filtered <- nrow(topicdata_filtered)
cat("Number of documents in filtered textdata: ", n_topicdata_filtered, "\n")
## Number of documents in filtered textdata: 606997
# Align rownames of theta with filtered textdata
theta_aligned <- theta[rownames(theta) %in% topicdata_filtered$doc_id, ]
# Step 2: Combine data
full_data <- data.frame(theta_aligned, decade = topicdata_filtered)
# get mean topic proportions per decade
# topic_proportion_per_decade <- aggregate(theta, by = list(decade = textdata$decade), mean)
# set topic names to aggregated columns
colnames(full_data)[2:(K+1)] <- topicNames
# reshape data frame
vizDataFrame <- melt(full_data)
## Using data market servic technolog manag • cloud solut secur provid, decade.text as id variables
#Examine topic names
#enframe(): Converts a named list into a dataframe.
topics <- enframe(topicNames, name = "number", value = "text") %>%
unnest(cols = c(text))
topics
## # A tibble: 6 × 2
## number text
## <chr> <chr>
## 1 Topic 1 news servic page copyright california word target bodi length initi
## 2 Topic 2 research develop technolog climat system energi disast chang scienc w…
## 3 Topic 3 financi busi result year increas includ million compani oper insur
## 4 Topic 4 state forc unit secur countri nation support china region intern
## 5 Topic 5 program fund committe million support health hous state act includ
## 6 Topic 6 data market servic technolog manag • cloud solut secur provid
Theme 1. Climate and Disaster Risk Management through Technological and Developmental Research Theme 2. Data-Driven Market and Cloud-Based Technological Solutions for Management and Security Theme 3. Institutional (Academic/Governmental) Recognition and Innovation in Technological Development Theme 4. Financial Performance, Insurance Coverage, and Operational Growth in the Corporate Sphere Theme 5. International and National Security Forces, Regional Support, and Development Efforts Theme 6. National Funding, Committee-Led Programs, and Government-Supported Initiatives
#Topic 1: Words: climat, disast, energi, develop, research, chang, technolog, impt, risk, system
Assessment: This cluster revolves around climate and disaster contexts, focusing on energy, development, research, and technological changes. The presence of words like “climat” and “disast” highlight environmental and crisis scenarios, while “technolog,” “develop,” and “research” suggest ongoing innovation and adaptation. Terms like “risk” and “system” imply structured approaches to managing vulnerabilities.
Theme: Climate and Disaster Risk Management through Technological and Developmental Research
Relevance to Project: As my project explores how AI is framed in disaster-related press releases, understanding how climate-related disasters are linked to energy systems, research, and technological changes is crucial. This topic indicates that AI might be positioned as a tool for risk assessment and strategic development in mitigating climate-driven disasters, further connecting energy and systems thinking to resilience planning.
##Topic 2: Words: data, market, technolog, manag, solut, cloud, secur, busi, provid
Assessment: This cluster centers on data-driven market and technological solutions. Words like “data,” “technolog,” “manag,” “cloud,” and “secur” hint at the infrastructure behind digital and analytic tools. “Market,” “busi,” and “provid” suggest commercial offerings and service provision in a technical domain, likely aiming to improve the management of crises through better data usage and secure, cloud-based solutions.
Theme: Data-Driven Market and Cloud-Based Technological Solutions for Management and Security
Relevance to Project: In disaster scenarios, AI tools often rely on secure, scalable, and cloud-based infrastructures to process large datasets. This topic suggests that organizational press releases may frame AI as part of a commercial, data-centric ecosystem, offering solutions that enhance situational awareness, market stability, and secure operations during crises.
##Topic 3: Words: univers, california, target, load, state, presid, award, bodi, develop, patent
Assessment: This cluster highlights academic and governmental elements: “univers,” “california,” “state,” and “presid” point to institutional settings. “Award,” “patent,” and “develop” indicate innovation and recognition of research achievements. “Target,” “load,” and “bodi” may relate to structural or logistical considerations. Overall, it suggests a network of universities, state bodies, and recognized innovations (patents and awards) focused on development.
Theme: Institutional (Academic/Governmental) Recognition and Innovation in Technological Development
Relevance to Project: AI’s framing may be influenced by academic research and patents recognized by states or universities. This topic indicates how organizational press releases might highlight university-based AI research, state-level acknowledgments, and patented technologies contributing to advanced disaster management strategies.
##Topic 4: Words: year, financi, busi, result, million, includ, increas, compani, oper, insur
Assessment: This cluster is strongly associated with financial and business outcomes. Words like “financi,” “busi,” “result,” “million,” and “increas” point to growth, investments, and company performance. “Insur” and “oper” refer to insurance and operations, suggesting risk management and protective measures. Collectively, it indicates a focus on economic performance, insurance coverage, and operational resilience.
Theme: Financial Performance, Insurance Coverage, and Operational Growth in the Corporate Sphere
Relevance to Project: In the context of disasters, press releases may highlight how AI-driven solutions contribute to financial stability, operational continuity, and insurance mechanisms. Understanding these financial narratives helps reveal how AI is framed as a critical tool for maintaining business resilience and mitigating disaster-related economic impacts.
##Topic 5: Words: state, forc, unit, secur, countri, nation, support, region, china, develop
Assessment: This cluster emphasizes geopolitical and security dimensions. Terms like “state,” “forc,” “unit,” “secur,” and “nation” suggest organized security forces or national defense strategies. “Region,” “china,” and “develop” indicate international context and development efforts. It suggests a global, national-security-oriented perspective, where multiple countries, including China, are involved in supportive or strategic roles.
Theme: International and National Security Forces, Regional Support, and Development Efforts
Relevance to Project: For disaster management, press releases might frame AI as integral to national and international coordination efforts. Understanding how AI supports security units, international collaborations, and development projects helps in seeing how AI’s role is communicated as part of broader geopolitical and disaster response strategies.
##Topic 6: Words: fund, committe, support, heh, million, program, nation, includ, hous, state
Assessment: This cluster centers on funding, committees, and national-level programs. Words like “fund,” “committe,” and “support” highlight financial and organizational backing. “Program,” “nation,” “state,” and “hous” suggest government-backed initiatives, possibly housing or resource allocation. “heh” may be a tokenization artifact, but the rest indicates structured, well-funded, national-level support programs.
Theme: National Funding, Committee-Led Programs, and Government-Supported Initiatives
Relevance to Project: AI might be introduced or expanded within such funded and committee-driven national programs to improve disaster readiness, resource distribution, and infrastructural support. This topic implies that organizational press releases could present AI as part of well-funded initiatives that enhance efficiency and resilience at the national and state levels.
Overall, these interpretations and themes give a sense of how AI in disaster scenarios is discussed across various contexts—academic, financial, national-security, and programmatic—offering insights into the multifaceted framing of AI in organizational press releases across the United States.